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ABSTRACT 

This report coirtpares maximum linear prediction, 
maximum total correct classifications for a group, and maximum 
prblaability of correct classification for an individual as objective 
criteria for univariate grading scales v Since the goals of valid 
prediction and valid classification lead to conflicting criteria, it 
isWpossible that a compromise measure consisting of a standardized 
score with a varia:bl€5./,pass-f ail point may provide the best measure 
available. (Author/Pri) 
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OBJECTIVE CRITERIA FOR EVALUATION OF GRADING SCALES 



The wide use of sophisticated multivariate measures sometimes leads 
us to overlook possible margins for improvement in the component uni- 
variate measures. The academic grade is, from one standpoint, an ex- 
ample of such a univariate measure • A single grade is typically used 
to summarize a multivariate collection of measurements and judgments of 
student behavior in a given course. Frequently, there is some inter- 
mediate measure such as the numerical sum of scores given on various 
tests and evaluations. The objective here will be to review some ways 
to assess the loss in information contained in these intermediate uni- 
variate measures through transformation to subroptimal rnivariate grade 
scales. The loss of information due to combination of multivariate 
measures into a univariate measure will not be covered. In the course 
of this review, explicit application will be made of some representa- 
tive criteria which appear to be implicitly used by those who have 
selected certain presently used grade scales. 

Criteria appear to be of two types; (1) those leading to grades 
which imply a continuum of performance and (2) those which imply dis- 
crete categories of performance. An example of criteria which imply a 
continuum of performance is the product-moment correlation coefficient 
as used by Willingham (1964) to measure effects of use of different 
grading scales on validity coefficients. Simple objective criteria of 
categorizations are more difficult to find but measures such as the 
total number of correct categorizations are representative. 

Since some type of 5-category grading scale (e.g. A-B-C-D-E) is 
probably most widely used in U. S. colleges we wiir begin by evaluating 
a selection of such scales. 

Category Size 

One approach to the translation of test scores into grade categories 
is to define the grade category size in terms of standard deviation 
(s.d.) units of the observed score distribution. If the underlying 
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distribution is normal it is obvious that only the middle three categories 
o£ a 5-category scale c^ conform to the specified size. Since the normal 
distribution is unbounded the two "outer" categories must each have one 
bound unspecified so as to include all extreme values. If we decrease 
the size of the three central categories we find that as the category 
size approaches zero, the central categories tend towards insignifi*- 
cant size and we obtain a 2-category scale (the extreme categories) as 
a limiting caseo On the other hand^, with increasing central category 
size (hereafter abbreviated "CCS") we find that more and more obser*- 
vations fall into the center category, a 1-category scale is ultimately 
approached^ As can be seen in Table 1, by the time a CCS of about 1.6 
8^d. Is reached the grade scale has become effectively a 3-category scale. 

TABLE 1 



Distribution of Ss on 5-category scales 
of Varying Oentral ^ategory-Sizes 

Category 



CCS 


A 


B 


C 


D 


E 


.0 


.50 


.00 


.00 


.00 


.50 


.2 


.38 


.08 


.08 


.08 


.38 


.4 


.27 


.15 


.16 


.15 


.27 


.6 


.18 


.20 


.24 


.20 


.18 


.8 


.12 


.23 


.31 


.23 


.12 


1.0 


. J 7 


. .24 


.38 


.24 


.07 


1.2 


.04 


.24 


.45 


.24 


.04 


1.4 


.02 


.22 


.52 


.22 


.02 


1.6 


.01 


.20 


.58 


.20 


.01 



Note. — CCS is in standard deviation units. 



An application of the criteria mentioned above would be the selection of 
one of the category sizes in this range as optimal for some related 
purpose. . 
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Grade Scaling Model 

By use of Pearson's tables of volumes under a bivarlate normal 
surface for different values of p (Pearson, 1931, Tables VIII and IX), 
it is possible to construct 5x5 contingency tables which represent 
the number of joint categorizations on two scales having a population 
correlation of p and a given sample size Such tables were con- 
structed for N=l ,000,000, p = 0, 150, ,70, .95^ and 1,00 and for the nine 
. CCS values used in Table 1* Table 2 shows one-such table, = In each 
case it is assumed that the scores on the two measures being compared 
are 

TABLE 2 
Sample Contingency Tablie 





Joint 


Distribution of 


Scores on 


Two Scales 


t 


A 


0 


0 


16 


16237 


50554 


B 


0 


27 . 


44512 


180954 


16237 


C 


16 


44512 


293870 


44512 


16 


D 


16237 


180954 - 


44512 


27 


0 


B 


50554 


16237 


16 


0 


0 




E 


D 


C 


B 


A 




Note.— p=.95 


, N=1,000,000, 


CCS=1.0 







based on an underlying continuous bivari ate normal distribution with 
parameter p. In the following discussion examples using p = *95 
will be emphasized since this is close to the value of ,94 suggested 
by Kelly (1927) as the minimum degree of relationship necessary for 
measures of individual accomplishment. * 

Correlation 

For each contingency table we can compute the product-moment cor« 
relation coefficient, r^. Figure 1 shows the relationship of £ to CCS 
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for p = .95, It will be noted that £ is less than p for all values of 
CCS. There are "coarse grouping" formulas available for correcting ^ 
as an estimate of p (e«g.. Peters & Van Vorrhis, 1940) but aside from 
the fart that they produce poor approximations as p approaches unity 
(Kelly, 1947), they are of little value here since our interests lie 
in the relationship exeirq)lified by r rather than by p. 



If r^ is to be maximized. Figure 1 indicates the optimum value of 
CCS (for p = .95) is about .4 s«d. For lower v^ues of p the optimum 
value of CCS- increases and at p = .50 it is about .8 s.d« 



- -=- -i - ^ 



The actual interpretation of r^ of course depends on how the two 

scales represented by th^e contingency tab lei^^ If the two 
iicaiesiS^^pte 

coe0fciS^V^^f\^ r^L^^.} &resh^ 
maj^g^^ade)^ 

sSnlor gra^^^wS^ "rlmjgef^falS 



tite; ^magni^ti^ ncatej^rie^^^ 
use^v Henjbe^^;^i^ 

s ired aivd ;th^^ is no mil : 4 

scales If i^th^i^ge n uSBer ^^q^^^^ ii9€5y 
has poinjSe«d jut ^t^s^i^^>fs£^ 

ef f eict of unreli^le-TOasure^^^ redificingjthe n of grade citegoriesS 

teductlpn of the rmri)^ the rell^iiirty oi^ 

the iniet^ure still m^ - 



Total (^rrec t X^laiisi f i cafeipns 



1 



The diagonal cells In X^le 2 f or-which^^.t^^ gracte occurs on 
both sfcales contain: all of Mthe -^^^ clMsif icStic^^^^ The propbrtibn 

of obsery^atiorS f is a second 

criterion for evaluating ^^s^^^ that although 

this criterion varies dtrectly^wt^^^ of questionable 

value, figure i shoKs^^ » #95. 

Compariisprf or for ^t^^^ t luid "mlnimimr 

occur at al^^ the^^bsyli: for P^^en 

CCS is zttb: <rthe 2*-categor>^ s)ia^-l^^ or infinite Othe 



.1 



-X-cateijoiy JB cale^ llmi^^ condttlori^):v v T^^ latter maximum is imlty for all v 

p -(i*e.^^t -i^ to mi3 class l^fy^^^^w avail- 

^^ble) whldeAtbe^ In gene raV the value of : 

sV^ (unlike T) l^ lnve^^^^ categories used. " Hencev 

f if the ajm is> to-^ of correct classifications 

^5ne shout(d5|u_s^^^ = t ^ ^ 



-J 



J A majqr :Sr^ 

ggradJiijpisc^^^^^ 3-categot^ (evgv 

?^ TABLE 3 t i : > - 




receiving the fflgidil^ ^^g ^l^^^^ ^^^^ 
scale. P(Bj^ | B^^^^K^ ^^l B a^^ H 
bilirtyAQfgr%ceigd.^^B ^^fjM 



1= 




^g -C) scales. The scale witl^ ^^^ ff^^^^^g^^1^|te^^j^Be^gf 
However, be cause AS^^^faf^^l 



:er P. 



wi*h- 



a p robabil:^ 3^^^ 1 y^^^yQj^^ln ^ivl a withy^^^^^g^^oX^i^^^ 
th rjg_e jtime^^^gjlL^l^ ^^^^^^fom ob s e rve d s cTre^5^ ^A"Co Jhl^:^ 



gisjtfMhaveB^ t e rlgigS&g&ii mumgtjbnt^ 

fgU^eiagf Sl^yi fd^Ki?^^ 



p robabl^^^^f^n^^^ 
Sejcjgcfi^^MgOTion for individuals falling into central categ^rieil^wfer^^^ 
is siSllT On the other hand, when CCS is large, high is fsI^glBj 
^^acl^g almost everyone in the middle category, thus r^^lnlng^:i^pin^> f 
^^s^Sthe inf orma^^i^ availabl^tngth^ttrigl^a^ Ti^^ 




It: 



r 

Ir - 

is 



■fT- . 



T 
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!f ^Tahle ^; Is the y^lidtty maiti^=^(^ 

ifrom ithe inform^ : :Ta])re>;4riS;-si^ a tabl6 of the ccm- ^ 

^^itl^n^l probibit^^ in any category" of one 

ss^j^ley^glyen that it'ils In-^^^ (e.g. 



ithe-prpbabllxty of receiving mi "A'!, "B'i,. "c" "t^" 



or "Ei' in course tl 



sglven>^hatr:a-I<B"T^:ts The values ofc the cells on ; 

Vthe major :-dl'agonal\vare: th 

loiE%oth??s;ca\les>^^^^^ 7 
ip/*recelVingi5t%: ;sam^^^ : =; 



fcrlterlon.v -Mke :;r and 



T' MC 



dndreasesi when^the^val^^^^^ p Increases. 



B^g^^ir^gMgiroms^ 

^whan>fpj;as5iz^^ 



when^the~GGSg^li3gab:Qut^lg^sBrig ^a!hl'3 MO p:&lmi ^a^^ 

1 which coropafes^^^^p ™aia^^l^^^^B§§^^^B SStjESi 



TABLE 4 



A 
B 
C 

E 



Sample ValidityTM^^^ 
Distribution of Scores on Scale'II for Ea^ 

Score onp Sg^l e I" 
.0000 ft^^ 06002 .2430 

.0000 ToOOl .1841 .7486 

""""^g^^^^g^Mf .1162 
^™ ,1841 .0001 

.0000 



.00^1 
.0672 



, 7486j 
.2430 



.0002 



B 



„7562; 
.0672 
.OOQPS 

.OO^J 

MoB 

A 



iNoteJi^Sgjij^fJJag^^ 



, un like ipat?ii*iffiSf f fjfeSlg^w^g^ 1 
pfSfo^Soccupancy. Being a <fuffcSion|a?Mthe^^ 
^classifications , is of greatest value as a^cr$terio^|Sol^^ 
cisions while P^q» giving "g^^^iwei^t to all ^ategb^egrj J^ aSfe 
value fSMsgfiterion for makiHg^ injdlv idu al« dg^ cF^ 
the total number of misclassif ici6i6S|3P^jfi:^2jS§Bjua6^^ 
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the probability that ^ should be used. 

lUnf or tunate^^ aivd^P^j^- 1^^ to th^ same^^^^^r^^^ only in the special 

dSse of unlfom dlstrlbutioh' ot^Q^ across all categories. ' ^ 



Comparls m. of Crl terl a 



1 

T 



- When an at temp t is under the 

T^sumnrtloif of anv^uriderly^^ 
giH^zationj\^ 

^^jacce|L^bl^3K@iK^ 

^hl? op^:ynjgg^g i^^^ ^ctatedgSj^us^l^i^^j^r^^fife^gt^^ of ^gp^ 

"^EvSn^af^f ace value, neither of the two categorization^^i^eria 
^guld be likely to be acceptable as the sole criterion for development of 
Plscale. Categorization is generally done for some immediate purpose, 
^gg. to determine those who must repeat the course, those who may con- 
^^nue with the next course, and those who are so advanced that they can | 
^^ip to beyond the next course. The^^tgl^^^i,on cri te^^^gi^glhlj^pi 
g^Eollowed alone, would provide relilbllf^^rehout gaaran teeing ^^^^ 



e 
i 



of Ss to be "pVSSiiiWilt W^X& SlM^i^i^fft^ ^^at^^i zation crit^^igj; 
gre more useful In evsLLM€±r^^piace^S^^^^^^^^^^^^^^^^^^ 



n 

'mc 



2 

.90 



3 
.82 
.89 



.89 



.91 .93 .93 



i 



i 



Note.~CCS»*l for all P^p and CCS».4 for all jc. p « .gSft 
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: I : is apparent from Table 5 that P^^^^^d^^ having a 

I miiiimum n while jr dictates the use ofr scales whl have a ihaximuifi ri. 

I 1? Since i t is a^^ 

I "having fewer categgriesv C^> g; ^A-B-C-^Dr E becomes- 

i I? -^'Eaii**) J a^cire^^v^i^ 

I / reduced:^to Mrj^ 

? f7 true« Hence, tf^ either^ 

J * ^:V^e ^requS^e^^ 

I n /rSSpracticaI^^>^^ V 

L := i ^ ^^ ^haye ia^-scaffe 



=this-i^ glg umi8^ 



use« 



Reference to Table 5 indi ca^teg ^^g^^fffi^^ ^^s^^^ ^SlWj ^fS g - 
4n>7 (For p - .95^ CCS * .4j go i^^^dS^^ ^MS Mnj^^^^^ ^^^^^ Ml^S s 
^oSly^^b:qti4tmOfc^)^t^ 



inviwsin^extrss^^ 

gresight^-^catcrg^ 



^(^jt^^l^^Jt^iSll^t^g^^b^p suppcjf^bgd ^fl^fe^ jg i^^^jig^^g^ 



aw^modalv^^^iffi^ ps^i ^B^^^^^^M ^^^^S fy^^^t^ a^ 
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1 



Conclusions 

Neither the 5-category nor the 3-category scale is optimal for 
either ^the prediction or the usual (pass-fail) categorization problem. 
Every single-score grading system which must provide information for 
both of these objectives is a compromise. The "pass-fail" marking 
systems reduce the task of che instructor t _ ? ♦critical de- 
cision rather than the four required for a >-caLegory system but^at 
the same time act as a filter for much of the predictive information 
available from classroom measures, "Curving" grades to fit a normal 
distribution with a predetermined pass-fail point removes the re- 
quire-u'ent for any ii>struct6r*s decisio^^^^^ a fair 

amount of linear predictiM the cost of ppssibr^^^^ pass-fail 

I'^^^/Z^^cit^ot^^^ scales wlfi/^^hou Id do^^we^e^Se 



^i^r^ht^gsj^-f^^a^^ 

^Jp^^^g^^feith^^ 
g§gggtl^ngrmight ^^gaii^^j^^^.^^^,3^^^J^S^^s^lS^J 
ign^g^ a^t .^^g^ffi^^^ 



1 



1^ 



i 



m 



^rtcyr^^^^^^^E-^ar^^i^^ ^ 
Jiayfi^^^ 

c6urd;:^r^^^^^iS^ 

>JC«^vedBwiS^ r T 



I; 

0 
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Appendix* 

Suppose we wish to examine the degree to which grades given in the 
first semester of a 2-semester course correlate with grades given in 
:t.e second semester of the course. In addition we^wish to determine 
what effect expressing the grades in terms of stanine scores might have 
on such a validity coefficient. 



f 



Four pairs of classes ^wjere examined, In^each case a raw score 
biased jon the sum of scores: on all t^s tsXgivjen dtfringv tKe semes ter was 
availableT A stanine trans fdrmation^^^^w 

these raw scores for each of the eight c be- 
tween raw scores, be 6ween^ Tirades, ^^^^^ then 
determined for those students ^ho had^atte fResults 

so£^tfef8^?c^i^(^t^B ^ 

V - . . . Table Al - ' 

34 ,71 ,57 

64 ,59 ,66 

18 .33 ,01 

^ 21 ,72 .78.^^ 



^oufse^ V 

jgraduate^: 
luate 



-Undergrajd 



A-B-C 
A-B-C J 
A-B 

A-B-C-D-E 



stanine^ 

• 70 

• 72 

• 24 

' .76 



- The*data:^represent^^m^ Most 

^videni;^arei>ffgjc^^ in 

:=the grarduate cbUrsje in Even 

^in the^^mder^ took the 

first is^m^Jter^^^^^^ feheT se\:pnd^s^^ AsnSt ght- be ex- 

pected ithig^ ia^ stu<i^tS;?ha^ 

^di^ stai^ ^^yxe^^^i^r^igy^yi^^ aiem^sfer student. ^ 

-In fSddi tlon to the : atte^Si^fci 3tfJ e rj^^^ 

■ j^ewed 4§t PTp4y ced -nonH^eai^^ of regression rl^r^elr^ral^^^^^^^ - 



V 
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for either grades or raw scores. - 

While stanine scores did not show uniformly highest vaiidity 
coefficients for these data they did account for an average of about 

^6% more variance than grades did. This improvement over grades is 
probably ^due to the joint effect of increased nund^ers of categories 
together with reduction in nonlinearity of regression. A linear 
transf ormation of raw scores to a 9-category" scale would not be 
expected to have the letter benefits, ^i&i additional drawbacE of 

^la simpler linear transf ormation is J:hat it requires more work than 

rihe stanine transformation. 



T 
1 



1 

^1: 



[rrtcal^da t a we gehe^usly^ ^p rby idgd iby^r^es^c^ Sv:^^|feXi 
3iV Hv ^atchel^ier^ aid ^I^piuf^ oj^tlfe^^ btf^IlStiaM:^^ 



RJO 




It 



. : I: 
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